Overview

Dataset statistics

Number of variables14
Number of observations244736
Missing cells40732
Missing cells (%)1.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory26.1 MiB
Average record size in memory112.0 B

Variable types

NUM9
CAT3
BOOL1
DATE1

Reproduction

Analysis started2020-07-12 21:52:17.743889
Analysis finished2020-07-12 21:52:48.234682
Duration30.49 seconds
Software versionpandas-profiling v2.9.0rc1
Download configurationconfig.yaml

Warnings

VERSIE has constant value "244736" Constant
DATUM_BESTAND has constant value "244736" Constant
PEILDATUM has constant value "244736" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1766 distinct values High cardinality
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 40732 (16.6%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.1168) Skewed

Variables

VERSIE
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
1
244736 
ValueCountFrequency (%) 
1244736100.0%
 

DATUM_BESTAND
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
2020-06-16
244736 
ValueCountFrequency (%) 
2020-06-16244736100.0%
 

Length

Max length10
Median length10
Mean length10
Min length10

PEILDATUM
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
2020-06-01
244736 
ValueCountFrequency (%) 
2020-06-01244736100.0%
 

Length

Max length10
Median length10
Mean length10
Min length10

JAAR
Date

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
Minimum2012-01-01 00:00:00
Maximum2020-01-01 00:00:00
Histogram with fixed size bins (bins=9)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

Distinct count27
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean421.267
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation918.833
Coefficient of variation (CV)2.18112
Kurtosis71.601
Mean421.267
Median Absolute Deviation (MAD)8
Skewness8.5722
Sum1.03099e+08
Variance844255
MonotocityNot monotonic
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%) 
3053465514.2%
 
3133176613.0%
 
3032829111.6%
 
330196778.0%
 
316166676.8%
 
308123025.0%
 
306101254.1%
 
324100784.1%
 
301100294.1%
 
30479733.3%
 
Other values (17)6317325.8%
 
ValueCountFrequency (%) 
301100294.1%
 
30253172.2%
 
3032829111.6%
 
30479733.3%
 
3053465514.2%
 
ValueCountFrequency (%) 
841831821.3%
 
19001610.1%
 
3905760.2%
 
38926781.1%
 
36238201.6%
 

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct count1766
Unique (%)0.7%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
101
 
1031
402
 
1002
403
 
983
301
 
978
201
 
926
Other values (1761)
239816 
ValueCountFrequency (%) 
10110310.4%
 
40210020.4%
 
4039830.4%
 
3019780.4%
 
2019260.4%
 
2039230.4%
 
4018270.3%
 
4048140.3%
 
4098050.3%
 
8028000.3%
 
Other values (1756)23564796.3%
 

Length

Max length4
Median length3
Mean length3.34829
Min length2

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct count5885
Unique (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.38386e+08
Minimum1.0501e+07
Maximum9.98418e+08
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum1.0501e+07
5-th percentile2.8999e+07
Q19.9799e+07
median1.49599e+08
Q39.90004e+08
95-th percentile9.90416e+08
Maximum9.98418e+08
Range9.87917e+08
Interquartile range (IQR)8.90205e+08

Descriptive statistics

Standard deviation4.28488e+08
Coefficient of variation (CV)0.97742
Kurtosis-1.72574
Mean4.38386e+08
Median Absolute Deviation (MAD)1.196e+08
Skewness0.47947
Sum1.07289e+14
Variance1.83602e+17
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
9.90004e+0818010.7%
 
9.90004e+0817540.7%
 
9.90003e+0817340.7%
 
9.90004e+0813770.6%
 
9.90356e+0812210.5%
 
9.90003e+0811270.5%
 
9.90356e+0811250.5%
 
1.31999e+0811110.5%
 
1.31999e+0810880.4%
 
1.99299e+0810340.4%
 
Other values (5875)23136494.5%
 
ValueCountFrequency (%) 
1.0501e+076< 0.1%
 
1.0501e+079< 0.1%
 
1.0501e+079< 0.1%
 
1.0501e+079< 0.1%
 
1.0501e+073< 0.1%
 
ValueCountFrequency (%) 
9.98418e+08112< 0.1%
 
9.98418e+0898< 0.1%
 
9.98418e+0827< 0.1%
 
9.98418e+086< 0.1%
 
9.98418e+085< 0.1%
 

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8593
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean495.397
Minimum1
Maximum152924
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median13
Q396
95-th percentile1658
Maximum152924
Range152923
Interquartile range (IQR)94

Descriptive statistics

Standard deviation3088.28
Coefficient of variation (CV)6.23395
Kurtosis386.354
Mean495.397
Median Absolute Deviation (MAD)12
Skewness16.4588
Sum1.21242e+08
Variance9.53748e+06
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14160117.0%
 
2202198.3%
 
3130965.4%
 
496894.0%
 
575353.1%
 
662652.6%
 
751932.1%
 
843801.8%
 
940581.7%
 
1035341.4%
 
Other values (8583)12916652.8%
 
ValueCountFrequency (%) 
14160117.0%
 
2202198.3%
 
3130965.4%
 
496894.0%
 
575353.1%
 
ValueCountFrequency (%) 
1529241< 0.1%
 
1518651< 0.1%
 
1445691< 0.1%
 
1272531< 0.1%
 
1117841< 0.1%
 

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct count9136
Unique (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean577.274
Minimum1
Maximum239907
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3105
95-th percentile1870
Maximum239907
Range239906
Interquartile range (IQR)102

Descriptive statistics

Standard deviation3890.33
Coefficient of variation (CV)6.73913
Kurtosis716.993
Mean577.274
Median Absolute Deviation (MAD)13
Skewness21.1168
Sum1.4128e+08
Variance1.51346e+07
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14016116.4%
 
2198738.1%
 
3129905.3%
 
495293.9%
 
574683.1%
 
662822.6%
 
751842.1%
 
843271.8%
 
939841.6%
 
1035561.5%
 
Other values (9126)13138253.7%
 
ValueCountFrequency (%) 
14016116.4%
 
2198738.1%
 
3129905.3%
 
495293.9%
 
574683.1%
 
ValueCountFrequency (%) 
2399071< 0.1%
 
2325081< 0.1%
 
2310041< 0.1%
 
2277571< 0.1%
 
2194521< 0.1%
 

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count7435
Unique (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7503.59
Minimum1
Maximum208905
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum1
5-th percentile33
Q1365
median1620
Q36152
95-th percentile36256
Maximum208905
Range208904
Interquartile range (IQR)5787

Descriptive statistics

Standard deviation17523.2
Coefficient of variation (CV)2.33531
Kurtosis32.4077
Mean7503.59
Median Absolute Deviation (MAD)1492
Skewness4.98213
Sum1.8364e+09
Variance3.07063e+08
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
44660.2%
 
144560.2%
 
174540.2%
 
84510.2%
 
204340.2%
 
214310.2%
 
124250.2%
 
34240.2%
 
264220.2%
 
94210.2%
 
Other values (7425)24035298.2%
 
ValueCountFrequency (%) 
13750.2%
 
24110.2%
 
34240.2%
 
44660.2%
 
54080.2%
 
ValueCountFrequency (%) 
20890519< 0.1%
 
20854025< 0.1%
 
20441717< 0.1%
 
20257417< 0.1%
 
20017716< 0.1%
 

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8202
Unique (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10523.4
Minimum1
Maximum338389
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum1
5-th percentile40
Q1470
median2197
Q38555
95-th percentile50745
Maximum338389
Range338388
Interquartile range (IQR)8085

Descriptive statistics

Standard deviation25320.4
Coefficient of variation (CV)2.40611
Kurtosis36.5898
Mean10523.4
Median Absolute Deviation (MAD)2040
Skewness5.25404
Sum2.57545e+09
Variance6.41122e+08
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
34160.2%
 
113950.2%
 
43900.2%
 
133810.2%
 
173740.2%
 
53710.2%
 
103570.1%
 
313480.1%
 
63460.1%
 
23430.1%
 
Other values (8192)24101598.5%
 
ValueCountFrequency (%) 
13320.1%
 
23430.1%
 
34160.2%
 
43900.2%
 
53710.2%
 
ValueCountFrequency (%) 
33838925< 0.1%
 
33807619< 0.1%
 
32354120< 0.1%
 
29941717< 0.1%
 
29399817< 0.1%
 

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count241
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean657361
Minimum39
Maximum1.48953e+06
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum39
5-th percentile27469
Q1248792
median744739
Q3995547
95-th percentile1.33749e+06
Maximum1.48953e+06
Range1.48949e+06
Interquartile range (IQR)746755

Descriptive statistics

Standard deviation424374
Coefficient of variation (CV)0.645573
Kurtosis-1.11151
Mean657361
Median Absolute Deviation (MAD)315360
Skewness0.0157318
Sum1.6088e+11
Variance1.80093e+11
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
88098751022.1%
 
87425843551.8%
 
84376543481.8%
 
88781843261.8%
 
86010942541.7%
 
69786339601.6%
 
1.08014e+0638921.6%
 
1.06621e+0638511.6%
 
1.0601e+0638411.6%
 
1.04034e+0638101.6%
 
Other values (231)20299782.9%
 
ValueCountFrequency (%) 
394< 0.1%
 
958< 0.1%
 
14138< 0.1%
 
39644< 0.1%
 
695119< 0.1%
 
ValueCountFrequency (%) 
1.48953e+0629761.2%
 
1.45066e+0630541.2%
 
1.4219e+0635641.5%
 
1.33749e+0635401.4%
 
1.33322e+0635471.4%
 

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count241
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.03549e+06
Minimum39
Maximum2.54922e+06
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum39
5-th percentile33778
Q1364439
median990055
Q31.72779e+06
95-th percentile2.18701e+06
Maximum2.54922e+06
Range2.54919e+06
Interquartile range (IQR)1.36335e+06

Descriptive statistics

Standard deviation721901
Coefficient of variation (CV)0.697156
Kurtosis-0.946127
Mean1.03549e+06
Median Absolute Deviation (MAD)652171
Skewness0.274463
Sum2.53423e+11
Variance5.21141e+11
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1.2118e+0651022.1%
 
1.28143e+0643551.8%
 
1.21594e+0643481.8%
 
1.30346e+0643261.8%
 
1.26438e+0642541.7%
 
98393739601.6%
 
2.54922e+0638921.6%
 
2.49615e+0638511.6%
 
2.54122e+0638411.6%
 
2.06818e+0638101.6%
 
Other values (231)20299782.9%
 
ValueCountFrequency (%) 
394< 0.1%
 
958< 0.1%
 
14238< 0.1%
 
39744< 0.1%
 
696119< 0.1%
 
ValueCountFrequency (%) 
2.54922e+0638921.6%
 
2.54122e+0638411.6%
 
2.49615e+0638511.6%
 
2.18701e+0637571.5%
 
2.06818e+0638101.6%
 

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct count3046
Unique (%)1.5%
Missing40732
Missing (%)16.6%
Infinite0
Infinite (%)0.0%
Mean3480.45
Minimum70
Maximum287220
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB

Quantile statistics

Minimum70
5-th percentile140
Q1455
median1220
Q33965
95-th percentile13130
Maximum287220
Range287150
Interquartile range (IQR)3510

Descriptive statistics

Standard deviation6624.36
Coefficient of variation (CV)1.9033
Kurtosis178.214
Mean3480.45
Median Absolute Deviation (MAD)995
Skewness8.12602
Sum7.10027e+08
Variance4.38822e+07
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
16017550.7%
 
10516370.7%
 
11015910.7%
 
18013470.6%
 
30012040.5%
 
14012020.5%
 
12011590.5%
 
14511460.5%
 
16511230.5%
 
50010680.4%
 
Other values (3036)19077278.0%
 
(Missing)4073216.6%
 
ValueCountFrequency (%) 
702260.1%
 
7574< 0.1%
 
803600.1%
 
858690.4%
 
905000.2%
 
ValueCountFrequency (%) 
2872208< 0.1%
 
1489103< 0.1%
 
1428804< 0.1%
 
1221554< 0.1%
 
1167653< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02020-06-162020-06-012015-01-013243061319991191186813742487923786801460.0
11.02020-06-162020-06-012015-01-01324306131999186888681374248792378680495.0
21.02020-06-162020-06-012015-01-01324306131999040193786813742487923786801150.0
31.02020-06-162020-06-012015-01-0132430613199920675010578681374248792378680245.0
41.02020-06-162020-06-012015-01-0132430613199911712128681374248792378680835.0
51.02020-06-162020-06-012015-01-01324306131999187118681374248792378680530.0
61.02020-06-162020-06-012015-01-0132430613199915415158681374248792378680850.0
71.02020-06-162020-06-012015-01-013243061319991559986813742487923786801170.0
81.02020-06-162020-06-012015-01-013243061319990205686813742487923786802335.0
91.02020-06-162020-06-012015-01-01324306131999022111286813742487923786806700.0

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2447261.02020-06-162020-06-012018-01-013163520991630051184189333450134406307473811275.0
2447271.02020-06-162020-06-012018-01-0131635209916300461133345013440630747381NaN
2447281.02020-06-162020-06-012018-01-0131635209916300455659333450134406307473812740.0
2447291.02020-06-162020-06-012018-01-013163520991630047283333345013440630747381NaN
2447301.02020-06-162020-06-012018-01-0131635209916300701415333450134406307473812660.0
2447311.02020-06-162020-06-012018-01-0131635209916300447733345013440630747381NaN
2447321.02020-06-162020-06-012018-01-0131635209916300691133345013440630747381NaN
2447331.02020-06-162020-06-012018-01-0131635209916300481133345013440630747381NaN
2447341.02020-06-162020-06-012018-01-0131635209916300532919402933345013440630747381260.0
2447351.02020-06-162020-06-012018-01-01316352099163005260067533345013440630747381880.0